Goto

Collaborating Authors

 ensemble component


Overleaf Example

Neural Information Processing Systems

Large transformer-based foundation models have been commonly used as pre-trained models that can be adapted to different challenging datasets and settings with state-of-the-art generalization performance.



Overleaf Example

Neural Information Processing Systems

Large transformer-based foundation models have been commonly used as pre-trained models that can be adapted to different challenging datasets and settings with state-of-the-art generalization performance.


376c6b9ff3bedbbea56751a84fffc10c-Supplemental.pdf

Neural Information Processing Systems

Does Knowledge Distillation Really Work? Here we briefly describe key implementation details to reproduce our experiments. Data augmentation details are given in A.1, followed by architecture details in A.2, and finally training details are provided in A.3. The reader is encouraged to consult the included code for closer inspection. A.1 Data augmentation procedures Some of the data augmentation procedures we consider attempt to generate data that is close to the train data distribution (standard augmentations, GAN, mixup). Others (random noise, out-of-domain data) produce data for distillation that the teacher would never encounter during normal supervised training.


Seeing the Unseen: How EMoE Unveils Bias in Text-to-Image Diffusion Models

Berry, Lucas, Brando, Axel, Chang, Wei-Di, Higuera, Juan Camilo Gamboa, Meger, David

arXiv.org Artificial Intelligence

Estimating uncertainty in text-to-image diffusion models is challenging because of their large parameter counts (often exceeding 100 million) and operation in complex, high-dimensional spaces with virtually infinite input possibilities. In this paper, we propose Epistemic Mixture of Experts (EMoE), a novel framework for efficiently estimating epistemic uncertainty in diffusion models. EMoE leverages pre-trained networks without requiring additional training, enabling direct uncertainty estimation from a prompt. We leverage a latent space within the diffusion process that captures epistemic uncertainty better than existing methods. Experimental results on the COCO dataset demonstrate EMoE's effectiveness, showing a strong correlation between uncertainty and image quality. Additionally, EMoE identifies under-sampled languages and regions with higher uncertainty, revealing hidden biases in the training set. This capability demonstrates the relevance of EMoE as a tool for addressing fairness and accountability in AI-generated content.


NBMLSS: probabilistic forecasting of electricity prices via Neural Basis Models for Location Scale and Shape

Brusaferri, Alessandro, Ramin, Danial, Ballarino, Andrea

arXiv.org Artificial Intelligence

Forecasters using flexible neural networks (NN) in multi-horizon distributional regression setups often struggle to gain detailed insights into the underlying mechanisms that lead to the predicted feature-conditioned distribution parameters. In this work, we deploy a Neural Basis Model for Location, Scale and Shape, that blends the principled interpretability of GAMLSS with a computationally scalable shared basis decomposition, combined by linear projections supporting dedicated stepwise and parameter-wise feature shape functions aggregations. Experiments have been conducted on multiple market regions, achieving probabilistic forecasting performance comparable to that of distributional neural networks, while providing more insights into the model behavior through the learned nonlinear feature level maps to the distribution parameters across the prediction steps. Introduction Probabilistic forecasting of hourly electricity prices in day-ahead power markets (PEPF) is a complex problem with a significant impact. These enable informed decision-making in high-stakes scenarios such as trading strategies, resource scheduling, and optimal commitment by factoring in potential fluctuations and associated risks [2]. Moreover, electricity prices are characterized by high volatility and rapid changes driven by intricate factors, including distributed power demand, generation costs, and weather conditions [3].


Error Diversity Matters: An Error-Resistant Ensemble Method for Unsupervised Dependency Parsing

Shayegh, Behzad, Lee, Hobie H. -B., Zhu, Xiaodan, Cheung, Jackie Chi Kit, Mou, Lili

arXiv.org Artificial Intelligence

We address unsupervised dependency parsing by building an ensemble of diverse existing models through post hoc aggregation of their output dependency parse structures. We observe that these ensembles often suffer from low robustness against weak ensemble components due to error accumulation. To tackle this problem, we propose an efficient ensemble-selection approach that avoids error accumulation. Results demonstrate that our approach outperforms each individual model as well as previous ensemble techniques. Additionally, our experiments show that the proposed ensemble-selection method significantly enhances the performance and robustness of our ensemble, surpassing previously proposed strategies, which have not accounted for error diversity.


Shedding Light on Large Generative Networks: Estimating Epistemic Uncertainty in Diffusion Models

Berry, Lucas, Brando, Axel, Meger, David

arXiv.org Artificial Intelligence

Generative diffusion models, notable for their large parameter count (exceeding 100 million) and operation within high-dimensional image spaces, pose significant challenges for traditional uncertainty estimation methods due to computational demands. In this work, we introduce an innovative framework, Diffusion Ensembles for Capturing Uncertainty (DECU), designed for estimating epistemic uncertainty for diffusion models. The DECU framework introduces a novel method that efficiently trains ensembles of conditional diffusion models by incorporating a static set of pre-trained parameters, drastically reducing the computational burden and the number of parameters that require training. Additionally, DECU employs Pairwise-Distance Estimators (PaiDEs) to accurately measure epistemic uncertainty by evaluating the mutual information between model outputs and weights in high-dimensional spaces. The effectiveness of this framework is demonstrated through experiments on the ImageNet dataset, highlighting its capability to capture epistemic uncertainty, specifically in under-sampled image classes.


EBBS: An Ensemble with Bi-Level Beam Search for Zero-Shot Machine Translation

Wen, Yuqiao, Shayegh, Behzad, Huang, Chenyang, Cao, Yanshuai, Mou, Lili

arXiv.org Artificial Intelligence

Machine translation is a widely applicable NLP task that translates a text from a source language to a target language Brown et al. (1990); Bahdanau et al. (2015). The Transformer architecture Vaswani et al. (2017) and pretrained large language models Radford et al. (2019); Raffel et al. (2020); Lewis et al. (2020) have largely improved translation performance, especially in the supervised setting, where a model can learn from large volumes of parallel corpora. However, machine translation remains challenging for low-resource languages, because there are not enough data for large neural networks to learn these languages. We specifically focus on multilingual translation in the zero-shot setting, where the system is required to translate between unseen language pairs. Since collecting parallel data and training individual models for every translation pair are prohibitively expensive, it is common to build a single multilingual system Johnson et al. (2017); Fan et al. (2021) that can perform translation for all language pairs, most of which are zero-shot translation directions with few exceptions (e.g., English). These models work by prepending a language-indicator token, and zero-shot ability emerges as the model generalizes from trained language pairs to unseen ones (Liu et al., 2021; Wicks and Duh, 2022).


Feature Preprocessor in Automated Machine Learning

#artificialintelligence

The performance of an automated machine learning(AutoML) workflow depends on how we process and feed different types of variables to the model, due to most machine learning models only accept numerical variables. Thus, categorical features encoding becomes a necessary step for any automated machine learning approaches. It not only elevates the model quality but also helps in better feature engineering. There are two major feature reduction strategies: principal component analysis(PCA) and feature selection. PCA is widely used in current AutoML frameworks, due to it often used for reducing the dimensionality of a large dataset so that it becomes more practical to apply machine learning where the original data are inherently high dimensional.